This project involves analyzing the popularity of various programming languages over time by processing and visualizing data. The dataset contains information about posts related to different programming languages, and the goal is to reshape, clean, and plot the data to identify trends.
-
Data Preprocessing:
- Reading a CSV file with programming language posts and reshaping it to analyze trends.
- Cleaning data by handling missing values.
- Manipulating the data to examine the relationship between programming languages and their popularity over time.
-
Data Visualization:
- Plotting the number of posts for each programming language across different dates.
- Applying a rolling mean to smooth out time series data and make trends more apparent.
-
Data Export:
- Saving the reshaped data and rolling mean data to CSV files for further use.
The script begins by reading the dataset QueryResults.csv. The columns are renamed to more meaningful names: DATE, TAG, and POSTS. The data is then loaded into a pandas DataFrame for easy manipulation.
-
Datetime Conversion:
- The
DATEcolumn, initially in string format, is converted to a datetime type to allow for proper time-based analysis.
- The
-
Reshaping the Data:
- The data is pivoted, with dates as rows and programming language tags as columns, displaying the number of posts for each language over time.
-
Data Cleaning:
- Missing values (
NaN) are replaced with zeros to ensure that the data is complete and ready for analysis.
- Missing values (
-
A line chart is generated to plot the number of posts for each programming language over time.
-
Each language is represented by a separate line on the chart, allowing for easy comparison of their popularity.

- Rolling Mean:
- The reshaped data is saved to a new CSV file (
reshaped_data.csv). - The rolling mean data is also saved to a separate CSV file (
rolling_data.csv) for further analysis.
- Python 3.x
- pandas
- matplotlib
You can install the required libraries using pip:
pip install pandas matplotlib


